Andrew Duncan: Bioinformatics

Showing posts with label Bioinformatics. Show all posts

Thursday, 9 July 2015

Pulling out individual chromosomes from fasta file

This is probably only useful under certain instances, but I thought I would share.

Using sed, you can grab a single chromosome from a fasta file.

sed -n '/>chr1/,/>chr2/p' <fasta>

Note that you can use this to grab multiple consecutive chromosomes too.

sed -n '/>chr1/,/>chr4/p' <fasta>

Get the nth base from a certain chromosome in a fasta file

I was trying to write my own tool to do this, but I doubt I could make it run as fast or faster than an existing tool.

Turns out samtools does the trick.

http://seqanswers.com/forums/showthread.php?t=17315

samtools faidx <fasta.fa> <seq>:<pos>-<pos>

For example, to get the 6078th base in chr 3:
samtools faidx <fasta.fa> chr3:6078-6078

Tuesday, 7 July 2015

Pulling sections out of FastQC output file

This is fairly straightforward. The output file generated by FastQC puts dividers in the data already. They look like the following:
>>Basic Statistics
>>END_MODULE

This is useful for if we want to separate the data. You can do so with the following script.

https://github.com/agduncan94/BioinformaticTools/blob/master/grabFastQCOutput.pl

I'm sure there are better ways, but if you just want to do it quickly then this will work.

Pages

Thursday, 9 July 2015

Pulling out individual chromosomes from fasta file

Get the nth base from a certain chromosome in a fasta file

Tuesday, 7 July 2015

Pulling sections out of FastQC output file